Hybridization normalization methods

ABSTRACT

The present invention includes methods of normalizing hybridization reactions that are designed to select normalization control genes, specifically 5′-3′-, and middle portions of the these genes, that hybridize similarly to a probe array and that produce the most consistently linear curve of hybridization signal over a range of normalization control gene segment concentrations. These methods have applicability across a broad spectrum of hybridization formats.

RELATED APPLICATIONS

This application is related to U.S. provisional application No.60/295,835, filed Jun. 6, 2001 and is herein incorporated by referencein its entirety.

FIELD OF THE INVENTION

The invention relates generally to methods for normalizing hybridizationreactions and optimizing the selection of normalization controls.

BACKGROUND OF THE INVENTION

Nucleic acid hybridization-based methods have become prevalent inmedical and biotechnological research and development, diagnostictesting, drug development and forensics. The reliability and utility ofthese nucleic acid hybridization-based methods depends on accurate andreliable methods for accounting for variations between analyses. Forexample, variations in hybridization conditions, label intensity,reading and detector efficiency, sample concentration and quality,background effects, and image processing effects each contribute tohybridization signal heterogeneity. Hegde et al. (2000) Biotechniques 29(3): 548-562; Berger et al. (2000) WO 00/04188.

Normalization of hybridization procedures such as Northern blot and DotBlot analyses has often relied on control hybridizations to housekeepinggenes such as (β-actin, glyceraldehyde-3-phosphate dehydrogenase, andthe transferrin receptor gene. Eickhoff et al. (1999) Nucleic AcidsResearch 27 (22): e33; Spiess et al. (1999) Biotechniques 26 (1): 46-50.These methods, however, generally do not provide the linearitysufficient to detect small but significant changes in transcription orgene expression. Spiess et al. (1999) Biotechniques 26 (1): 46-50. Inaddition, the steady state levels of many housekeeping genes aresusceptible to alterations in expression levels that are dependent oncell differentiation, nutritional state, specific experimental andstimulation protocols. Eickhoff et al. (1999) Nucleic Acids Research 27(22): e33; Spiess et al. (1999) Biotechniques 26 (1): 46-50; Hegde etal. (2000) Biotechniques 29 (3): 548-562; and Berger et al. (2000) WO00/04188.

In addition to numerous assay-associated factors, such as variations inbackground, labeling, hybridization conditions and detection,characteristics of the hybridization control molecule itself, such asvariations in base composition, probe length, secondary structure andability to cross-hybridize with the probes or target nucleic acids, alsocontribute to the difficulty and imprecision of comparing resultsbetween analyses. The normalization of array format hybridizations hastypically been conducted using full-length hybridization controls thatare complementary to oligonucleotide probes contained on the array.(Affymetrix GeneChip® Expression Analysis Manual). Full-lengthhybridization controls, however, increase the likelihood ofcontrol-specific background effects as the normalization curvesgenerated using full-length normalization controls may not achieve thelinearity and reproducibility necessary for many of the emergingapplications of array hybridization methodologies.

SUMMARY OF THE INVENTION

The present invention is based on the surprising discovery of methodsfor optimizing the normalization of hybridization reactions comprising anucleic acid sample, the method comprising the step of adding at leastone normalization control gene segment to the hybridization reactioncorresponding to the 3′, 5′ and middle regions of at least onenormalization control gene. The normalization controls of the presentinvention are selected from nucleic acids that are not present in thenucleic acid sample. Preferably, the normalization controls are selectedfrom, viral, prokaryotic or eukaryotic genes. In a preferred embodiment,the normalization control genes are selected from a Escherichia coliBioB, BioC, or BioD gene, a P1 bacteriophage cre gene, or a Bacillussubtilis dap, thr, trp, phe or lys gene.

The normalization control gene segments of the present invention aretypically either DNA or RNA and may be produced by the polymerase chainreaction or cloning of the normalization control genes or segments intoa vector and expression of the normalization control genes or segmentsin a host cell. RNA normalization control gene segments may be produced,for example, by in vitro transcription of the cloned normalizationcontrol genes or segments.

The methods of the present invention are applicable to any hybridizationassay format. Preferred formats include formats where an oligonucleotideprobe, complementary to the normalization control gene segments, isimmobilized on a solid support such as filters, polyvinyl chloridedishes, silicon or glass beads or wafers in an array. Preferred arraysinclude high density or nucleic acid chip arrays. The oligonucleotideprobes may be selected from nucleic acids isolated from human,non-humans, animals, microorganisms, bacteria, fungi, plants, andnucleic acids isolated from specific normal or diseased tissue.

The nucleic acid samples compatible with the methods of the instantinvention include pooled nucleic acid samples, genomic DNA, cDNA, cRNA,mRNA, and polyA RNA.

The normalization control gene segments of the instant invention areselected by a method that comprises determining the non-specificcross-hybridization of the nucleic acid sample to the normalizationcontrol gene segments, wherein the normalization control gene segmentsthat do not substantially cross-hybridize are selected. In anotherembodiment, the normalization controls of the present invention areselected by a method comprising analyzing a series of hybridizationreactions, wherein each hybridization reaction of the series contains anincreased concentration of the normalization control gene segment, andwherein the normalization control gene segments that produce the mostconsistently linear curve of hybridization signal over a range ofnormalization control gene segment concentrations are selected.

In a preferred embodiment, the methods of normalizing a hybridizationreaction of the present invention comprise the steps of:

-   -   a) providing a normalization control comprising one or more        normalization control gene segments, wherein said normalization        control gene segments are mixed with the nucleic acid sample,        and wherein said normalization control gene segments are        prepared by a method comprising:        -   i) selecting one or more candidate normalization control            genes;        -   ii) segmenting the candidate normalization control genes            into 5′-, middle-, and 3′-segments, thereby producing            candidate normalization control gene segments;        -   iii) hybridizing said candidate normalization control gene            segments to an oligonucleotide probe in the presence and            absence of the nucleic acid sample;        -   iv) determining the non-specific cross-hybridization of            candidate normalization control gene segments to said            oligonucleotide probe by determining the hybridization of            candidate normalization control gene segments to probes            other than those complementary to the candidate            normalization control gene segments;        -   v) repeating step (iii) at various concentrations of            candidate normalization control gene segments; and        -   vi) identifying and selecting those candidate normalization            control gene segments that do not substantially            cross-hybridize to said oligonucleotide probe.

In a more preferred embodiment, the methods of normalizing ahybridization reaction of the present invention comprise steps whereinthe normalization control gene segments are prepared by method furthercomprising the following steps:

-   -   a) preparing individual mixtures of nucleic acid samples and        candidate normalization control gene segments wherein each        individual mixture contains a different concentration of the        candidate normalization control gene segments identified in step        (vi);    -   b) hybridizing a mixture of step (a) to an oligonucleotide        probe;    -   c) repeating step (b) with mixtures containing different        concentrations of candidate normalization control gene segments;    -   d) identifying the candidate normalization control gene segments        that produce the most consistently linear hybridization response        over a range of candidate normalization control gene segment        concentrations by measuring the hybridization of said candidate        normalization control gene segments to oligonucleotide probes        that are complementary to the normalization control gene        segments over a range of candidate normalization control gene        segment concentrations; and    -   e) producing a solution or composition containing one or more of        the candidate normalization control gene segments of step (d)        over a concentration range sufficient to produce a linear        normalization curve.

In the most preferred embodiment, the methods of the present inventionfurther comprise the steps of hybridizing a mixture of said nucleic acidsample and the solution of step (e) to said array, and quantifying thehybridization of said target or pool of nucleic acid sample to saidarray.

The methods of the present invention also contemplate usingnormalization control gene segments that are labeled with either afluorescent, chemiluminescent, bioluminescent, colorimetric, or a lightscattering label.

In another embodiment, the methods of the present invention furthercomprise the step of fragmenting the normalization control gene segmentsprior to use.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1. Standard curves for each normalization control gene segmenthybridized to a GeneChip® at concentrations ranging from 0.5-100 pM.

FIG. 2. Standard curves generated from hybridization of normalizationcontrol gene cocktail 831, 849, and 7211 to various GeneChips®.

FIG. 3. Standard curve generated from hybridization of normalizationcontrol gene cocktail 7211, which contains BioB3′ at 75 pM and BioD3′ at100 pM, to various GeneChips®.

DETAILED DESCRIPTION

The present Inventors have developed methods of normalizinghybridization reactions that are designed to select normalizationcontrol genes, specifically the 5′-, 3′-, and middle-portions of thethese genes, that hybridize to a probe array and produce the mostconsistently linear hybridization signal over a range of normalizationcontrol gene segment concentrations. These methods have applicabilityacross a broad spectrum of hybridization formats. Although any nucleicacid may serve as a normalization control, a careful analysis of thespecific characteristics of any given normalization control will enableoptimization of the linearity of the normalization control hybridizationsignal, thereby increasing both the accuracy and precision of theanalyses. The normalization controls of the present invention may beselected from a variety of sources and different coding and non-codingregions. The identity of the normalization control ultimately selectedwill depend on the specific application and hybridization format inwhich the control will be used. The present invention is applicable toany normalization control or set of normalization controls that can beselected and prepared, by the methods of the present invention, for usein hybridization reactions of any format.

A. Hybridization Controls.

In addition to specific oligonucleotide probes which bind the nucleicacid sample, a hybridization format may contain one or more controlprobes. The control probes fall into three categories referred to hereinas: (1) normalization control probes; (2) expression level controlprobes; and (3) mismatch control probes.

As used herein, “normalization controls” are polynucleotides,oligonucleotides or other nucleic acids that are added to a nucleic acidsample and include “normalization control genes” and “control genesegments”. As used herein, “normalization control probes” areoligonucleotides or other nucleic acid probes that are complementary tothe normalization control genes or normalization control gene segmentsand are used to detect or quantitate the normalization control genes ornormalization control gene segments in a nucleic acid sample.

As used herein, “normalization control gene segment(s)” is a portion ofthe “normalization control gene(s)”. Preferably, a normalization controlgene segment comprises the 5′-, 3′- or middle-portion of the“normalization control gene”.

The signals obtained from the normalization controls after hybridizationprovide a control for variations in hybridization conditions, labelintensity, “reading” efficiency and other factors that may cause thesignal of a perfect hybridization to vary between arrays. In a preferredembodiment, signals (e.g., fluorescence intensity), read from all otherprobes in the array, are divided by the signal from the control probes,thereby normalizing the measurements.

As used herein, “expression level controls” are nucleic acids thathybridize specifically with constitutively expressed genes in thebiological sample. Virtually any constitutively expressed gene providesa suitable target for expression level controls. Typical expressionlevel control probes have sequences complementary to subsequences ofconstitutively expressed “housekeeping genes” including, but not limitedto the β-actin gene, the transferrin receptor gene, theglyceraldehyde-3-phosphate dehydrogenase gene (GAPDH), and the like.

As used herein, “mismatch control” refers to an oligonucleotide whosesequence is deliberately selected not to be perfectly complementary to aparticular oligonucleotide probe. For each mismatch (MM) probe in ahigh-density array there typically exists a corresponding perfect match(PM) probe that is perfectly complementary to the same particularmismatch control sequence. The mismatch may comprise one or more bases.

While the mismatch(s) may be located anywhere in the mismatch probe,terminal mismatches are less desirable as a terminal mismatch is lesslikely to prevent hybridization of the target sequence. In aparticularly preferred embodiment, the mismatch is located at or nearthe center of the probe such that the mismatch is most likely todestabilize the duplex with the mismatch control probe under the testhybridization conditions. Mismatch controls thus provide a control fornon-specific binding or cross-hybridization of the control sequence toan oligonucleotide probe other than the one to which the mismatchcontrol is directed. Mismatch controls also indicate whether ahybridization is specific or not.

As used herein, “perfect match probe” refers to a probe that has asequence that is perfectly complementary to a particular controlsequence. The perfect match probe is typically perfectly complementaryto a portion (subsequence) of the control sequence. The perfect matchprobe can be a “test probe” a “normalization control” probe, anexpression level control probe and the like. A perfect match control,however, is distinguished from a “mismatch control.”

1. Selection of Normalization Controls.

The nucleic acids of the normalization-controls of the present inventioncan be obtained from any source. A preferred source is animal nucleicacids, and in some formats a more preferred source is human nucleicacids. Plant nucleic acids, and microbial nucleic acids, specificallyincluding bacterial and fungal nucleic acids, are also preferred sourcesof normalization control nucleic acids. Although any nucleic acid may beutilized as a normalization control for any hybridization format, thenormalization control for a particular hybridization reaction ispreferably: 1) neither related to the family of sequences present in thenucleic acid sample nor their corresponding oligomeric probes; 2)identical to the sequence or subsequence of a normalization controlprobe that is included in the hybridization assay; and 3) easilysynthesized or prepared.

As used herein, normalization control gene nucleic acids that meet theabove criteria for a particular hybridization reaction are referred toas “candidate normalization control genes.” Following identification,the “candidate normalization control genes” are then segmented. In apreferred embodiment, these “normalization control gene segments”correspond to between about 95% and 75% of the normalization controlgene; preferably between about 75% and 50% of the normalization controlgene, and more preferably between about 50% and 25% or between about 25%and 5% of the normalization control gene. In another embodiment, thenormalization control gene segments correspond to the 5′-, middle-, and3′-regions of the normalization control gene. As used herein,“5′-region” of the normalization control gene refers to the aboutone-third of the normalization control gene that begins at the 5′-end ofeither the sense or anti-sense strand of the normalization control gene.As used herein, “middle-region” of the normalization control gene refersto the middle about one-third of either the sense or anti-sense strandof the normalization control gene. As used herein, “3′-region” of thenormalization control gene refers to the about one-third of thenormalization control gene that begins at the 3′-end of either the senseor anti-sense strand of the normalization control gene.

The cross-hybridization of the candidate normalization control genesegments is analyzed by comparing the hybridization of the normalizationcontrol gene segments in the presence and absence of nucleic acidsample. As used herein, the terms “cross-hybridize(s)” and“cross-hybridization” refer to hybridization resulting from non-specificbinding, or other interactions, between the labeled normalizationcontrol gene segment(s) and components of the hybridization reactionother than the normalization control probe(s) that is complementary tothe normalization control gene segment(s) (e.g., the oligonucleotideprobes, other non-complementary control probes, the substrate or matrixof the particular hybridization reaction, nucleic acid sample, etc.).

As used herein, “background” refers to signals associated withnon-specific binding (cross-hybridization). In addition tocross-hybridization, background may also be produced by intrinsicfluorescence of the hybridization format components themselves. A singlebackground signal can be calculated for the entire format, or adifferent background signal may be calculated for each nucleic acidsample or normalization control gene segment. In a preferred embodiment,background is calculated as the average hybridization signal intensityfor the lowest 5% to 10% of the probes in an array, or, where adifferent background signal is calculated for each nucleic acid sampleor normalization control gene segment, for the lowest 5% to 10% of theprobes for each sample. Of course, one of skill in the art willappreciate that where the probes to a particular sample or normalizationcontrol gene segment hybridize well, and thus, appear to specificallybind to a nucleic acid sample or normalization control gene segment,they should not be used in a background signal calculation.Alternatively, background may be calculated as the average hybridizationsignal intensity produced by hybridization to probes that are notcomplementary to any sequence found in the nucleic acid sample ornormalization control gene segment (e.g., probes directed to nucleicacids of the opposite sense or to genes not found in the sample, such asbacterial genes where the sample is mammalian nucleic acids). In nucleicacid array formats, for example, background can be calculated as theaverage signal intensity produced by regions of the array that lack anyprobes at all.

As used herein, normalization control genes or normalization controlgene segments that are “complementary” to one or more of theoligonucleotide probes used in the hybridization formats describedherein, refers to normalization control genes or normalization controlgene segments that are capable of hybridizing under stringent conditionsto at least part of the oligonucleotide probe. Such hybridizablenormalization control genes or normalization control gene segments willtypically exhibit at least about 75% sequence identity at the nucleotidelevel to said probes, preferably about 80% or 85% sequence identity ormore preferably about 90% or 95% or more sequence identity to saidprobes.

“Bind(s) substantially” refers to complementary hybridization between anoligonucleotide probe and a nucleic acid sample or normalization controlgene segment and embraces minor mismatches that can be accommodated byreducing the stringency of the hybridization media to achieve thedesired detection of the nucleic acid sample.

The phrase “hybridizing specifically to” refers to the binding,duplexing or hybridizing of a molecule substantially to or only to aparticular nucleotide sequence or sequences under stringent conditionswhen that sequence is present in a complex mixture (e.g., totalcellular) DNA or RNA.

In order to determine the optimal concentration for use with eachindividual normalization control gene segment, a nucleic acid sample ismixed with one normalization control gene segment, at a particularconcentration of the normalization control gene segment, and hybridizedto the oligonucleotide probes according to the procedures describedherein. Each normalization control gene segment is analyzed over a rangeof concentrations which, for example, may include about 0.1 pm to about50 nM or include about 0.5 pM, 0.75 pM, 1.0 pM, 1.5 pM, 2 pM, 3 pM, 5pM, 12.5 pM, 25 pM, 50 pM, 75 pM, 100 pM and 150 pM. The medianintensity of the normalization control gene segments bound at eachconcentration is plotted so that the normalization control gene segmentsthat hybridize to the probes and produce the most consistently linearcurve of hybridization signal over a range of normalization control genesegment concentrations are selected. A linear correlation is anyrelationship between two variables (i.e. normalization control genesegment concentration and hybridization signal) such that a graphicalplot of one variable against the other produces a approximately straightline. As used herein, “linear coefficient” refers to the degree to whichthe relationship of one variable to another produces a line with a slopeequal to about 1.00. As used herein, “linear curve” refers to a linethat has a linear coefficient of r=about 0.980 to about 1.000. As usedherein, “consistently linear curve” refers to a series of linear curves,derived from a series of analyses of a nucleic acid sample, using thenormalization controls of the present invention, wherein the linearcurves generated from plotting the hybridization signal versus theconcentration of the normalization control have linear coefficientsbetween r about 0.985 and r=about 1.000.

2. Preparation of Normalization Controls.

Nucleic acids to be used as normalization control genes may be obtainedfrom a variety of natural sources such as organisms, organs, tissues andcells. The sequences of known genes are in the public databases. Thesequences of the genes in GenBank are expressly incorporated byreference. The complete genomes of several organisms are available atthe National Center for Biotechnology Information (see,http://www.ncbi.nlm.nih.gov/Entrez/Genome/org.html.). Normalizationcontrol genes that are based on the sequences of these genes, forexample, may be prepared by any commonly available method or obtainedfrom the American Type Culture Collection (ATCC), Manassas, Va., forexample, or other commercial sources. Normalization control genes of thepresent invention include single-stranded or double-stranded nucleicacid molecules, including RNA, DNA, cRNA and cDNA.

Sources of normalization control gene nucleic acids include prokaryoticcells, such as the bacterial cells of species of the genera Escherichia,Bacillus, Serratia, Salmonella, Neisseria, Treponemia, Staphylococcus,Streptococcus, Clostridium, Chlamydia, Neisseria, Treponema, Mycoplasma,Borrelia, Legionella, Pseudomonas, Mycobacterium, Helicobacter, Erwinia,Agrobacterium, Rhizobium, and Streptomyces. Sources of normalizationcontrol genes also include eukaryotic cells such as fungi, especiallyyeast, plants, protozoans, parasites, animals, insects, especiallyDrosophila, nematodes, especially Caenorhabditis elegans, and mammals,including humans.

The candidate normalization control genes can be digested with anycommercially available restriction endonuclease or other cleaving agent,under conditions sufficient to produce a 5′-, middle-, and 3′-portion ofa normalization control gene. Following isolation and purification,these resultant normalization control gene segments can be useddirectly, amplified by PCR methods or amplified by replication orexpression from a vector. PCR techniques comprise the hybridization(annealing) of two primer oligonucleotides to a template nucleic acidand elongation of the oligonucleotide primers by a thermostablepolymerase. Multiple cycles of polymerization, denaturation andannealing result in amplification of the template nucleic acid. (See,Mullis et al. (1987) Meth. Enzymol. 155: 335-350; U.S. Pat. No.4,683,195; U.S. Pat. No. 4,683,202).

RNA or DNA can be produced by in vitro transcription from a templatepolynucleotide, using commercially available reagents and kits from NewEngland Biolabs, Beverly, Mass.; Invitrogen Corporation, San Diego,Calif., or Ambion, Incorporated, Austin, Tex. To utilize in vitrotranscription reactions, the desired template is constructed by operablylinking a target polynucleotide sequence to a promoter that isrecognized by polymerase to produce either DNA or RNA. Examples ofpromoters include: the T3 phage promoter; the T7 phage promoter; and theSP6 phage promoter. If the Ambion, Inc. MEGAscript™ T7 kit (Cat. No.1334) is used, the polynucleotide sequence is operably linked to a T7phage promoter.

Normalization control gene segments produced by the polymerase chainreaction (PCR), direct synthesis or restriction endonuclease digestioncan be amplified by placing the normalization control gene segment in avector according to established protocols. Sambrook et al (1989)Molecular Cloning: A Laboratory Manual, Second Edition; DNA Cloning,Vols. I and II (D. N. Glover ed. 1985); Perbal (1984) A Practical Guideto Molecular Cloning; Gene Transfer Vectors for Mammalian Cells(J. H.Miller et al. eds. (1987) Cold Spring Harbor Laboratory, Cold SpringHarbor, N.Y.); Scopes, Protein Purification: Principles and Practice(2^(nd) ed., Springer-Verlag); PCR: A Practical Approach McPherson etal. eds. (1991) IRL Press. The resultant vectors can be used totransform bacterial cells by established protocols. See e.g., Sambrooket al. The transformed bacterial cells can be cultured according toestablished protocols. See e.g., Sambrook et al. The plasmid DNA fromthe overnight cultures can be isolated using QIAGEN plasmid kits andother standard procedures. See e.g., Sambrook et al. The isolatedplasmid DNA is digested with an appropriate restriction endonucleaseaccording to the manufacturer's protocols. Digestion of the isolatedplasmid can be monitored by gel electrophoresis in a 1% agarose gel.

Normalization control genes and normalization control gene segments(i.e., synthetic oligo- and polynucleotides) can easily be synthesizedby chemical techniques, for example, the phosphotriester method ofMatteucci, et al ((1981) J Am. Chem. Soc. 103: 3185-3191) or usingautomated synthesis methods. In addition, larger nucleic acids canreadily be prepared by well known methods, such as synthesis of a groupof oligonucleotides that define various modular segments of thenormalization control genes and normalization control gene segments,followed by ligation of oligonucleotides to build the complete nucleicacid molecule.

The present invention further provides recombinant nucleic acidmolecules that encode the normalization control genes and normalizationcontrol gene segments. As used herein, a “recombinant nucleic acidmolecule” refers to a nucleic acid molecule that has been subjected tomolecular manipulation in vitro. Methods for generating recombinant DNA(rDNA) molecules are well known in the art. See e.g., Sambrook et al.(1989); Perbal (1984); and Scopes (1991). In the preferred recombinantnucleic acid molecules, a nucleotide sequence that encodes anormalization control gene or a normalization control gene segment isoperably linked to one or more expression control sequences and/orvector sequences.

The choice of vector and/or expression control sequences to which thenormalization control genie or normalization control gene segment isoperably linked depends directly, as is well known in the art, on thefunctional properties desired (e.g., the host cell to be transformed). Avector contemplated by the present invention is at least capable ofdirecting the replication or amplification, of the nucleotide sequenceencoding the normalization control gene or normalization control genesegment.

In one embodiment, the vector containing a normalization control gene ornormalization control gene segment will include a prokaryotic replicon,i.e., a DNA sequence having the ability to direct autonomous replicationand maintenance of the recombinant DNA molecule intrachromosomally in aprokaryotic host cell, such as a bacterial host cell, transformedtherewith. Such replicons are well known in the art. In addition,vectors that include a prokaryotic replicon may also include a genewhose expression confers a detectable marker such as a drug resistance.Typical bacterial drug resistance genes are those that confer resistanceto ampicillin (Amp) or tetracycline (Tet).

Vectors that include a prokaryotic replicon can further include aprokaryotic or viral promoter capable of directing the expression(transcription) of the normalization control gene or normalizationcontrol gene segment in a bacterial host cell, such as E. coli. Apromoter is a control element formed by a nucleotide sequence thatpermits binding of RNA polymerase and transcription to occur. Promotersequences compatible with bacterial hosts are typically provided inplasmid vectors containing convenient restriction sites for insertion ofa DNA segment of the present invention. Typical of such vector plasmidsare pUC8, pUC9, pBR322 and pBR329 available from Biorad Laboratories(Richmond, Calif.), pPL and pKK23 available from Pharmacia, Piscataway,N. J.

Expression vectors compatible with eukaryotic cells, preferably thosecompatible with vertebrate cells, can also be used to express nucleicacid molecules that contain a nucleotide sequence that encodes anormalization control gene or normalization control gene segment.Eukaryotic cell expression vectors are well known in the art and areavailable from several commercial sources. Typically, such vectorsprovide convenient restriction sites for insertion of the desirednucleic acid segment. Typical of such vectors are pSVL and pKSV-10(Pharmacia), pBPV-1/pML2d (International Biotechnologies, Inc.), pTDT1(ATCC, #31255), the vector pCDM8 described herein, and other likeeukaryotic expression vectors.

Eukaryotic cell expression vectors used to construct the recombinantmolecules of the present invention may further include a selectablemarker that is effective in a eukaryotic cell, preferably a drugresistance selection marker. A preferred drug resistance marker is thegene whose expression results in neomycin resistance, i.e., the neomycinphosphotransferase (neo) gene. Southern et al., J. Mol. Anal. Genet.(1982) 1:327-341. Alternatively, the selectable marker can be present ona separate plasmid, and the two vectors are introduced by cotransfectionof the host cell, and selected by culturing in the presence of theappropriate drug for the selectable marker.

The present invention farther provides host cells transformed with anucleic acid molecule that encodes a normalization control gene ornormalization control gene segment of the present invention. The hostcell can be either prokaryotic or eukaryotic. Eukaryotic cells usefulfor replication of a normalization control gene or normalization controlgene segment are not limited, so long as the cell line is compatiblewith cell culture methods and compatible with the propagation of theexpression vector and expression of the normalization control genes ornormalization control gene segments. Preferred eukaryotic host cellsinclude, but are not limited to, yeast, insect and mammalian cells,preferably vertebrate cells such as those from a mouse, rat, monkey orhuman fibroblastic cell line.

Transformation of appropriate cell hosts with nucleic acid moleculesencoding a normalization control gene or normalization control genesegment of the present invention is accomplished by well known methodsthat typically depend on the type of vector and host system employed.With regard to transformation of prokaryotic host cells, electroporationand salt treatment methods are typically employed. See e.g., Cohen etal., Proc Natl Acad Sci USA (1972) 69:2110; Maniatis et al., MolecularCloning. A Laboratory Manual, Cold Spring Harbor Laboratory, Cold SpringHarbor, N.Y. (1982); Sambrook et al. (1989); Perbal (1984); and Scopes(1991). With regard to transformation of vertebrate cells with vectorscontaining rDNAs, electroporation, cationic lipid or salt treatmentmethods are typically employed. See, for example, Graham et al.,Virology (1973) 52:456; Wigler et al, Proc. Natl. Acad. Sci. U.S.A.(1979) 76:1373-76.

Successfully transformed cells, i.e., cells that contain a nucleic acidmolecule encoding the normalization control gene or normalizationcontrol gene segment of the present invention, can be identified by wellknown techniques. For example, cells resulting from the introduction ofa nucleic acid molecule of the present invention can be cloned toproduce single colonies. Cells from those colonies can be harvested,lysed and their nucleic acids content examined for the presence of therecombinant molecule using a method such as that described by Southern,J. Mol. Biol. (1975) 98:503, or Berent et al., Biotech. (1985) 3:208.The present invention further provides methods for producing anormalization control gene or normalization control gene segment. Ingeneral terms, the production of a recombinant normalization controlgene or normalization control gene segment typically involves thefollowing steps.

First, a nucleic acid molecule is obtained that encodes a normalizationcontrol gene or normalization control gene segment. Said nucleic acidmolecule is then preferably placed in an operable linkage with suitablecontrol sequences, as described above. The expression unit is used totransform a suitable host and the transformed host is cultured underconditions that allow the production of the normalization control geneor normalization control gene segment. Optionally, the rDNA molecule isisolated from the medium or from the cells; recovery and purification ofthe normalization control gene or normalization control gene segment maynot be necessary in some instances where some impurities may betolerated.

Each of the foregoing steps can be done in a variety of ways. Forexample, the desired sequences may be obtained from genomic fragmentsand used directly in an appropriate host. The construction of vectorsthat are operable in a variety of hosts is accomplished using anappropriate combination of replicons and control sequences. The controlsequences, vectors, and transformation methods are dependent on the typeof host cell used to express the gene and were discussed in detailearlier. A skilled artisan can readily adapt any host system known inthe art for use with the nucleotide sequences described herein toproduce the normalization control genes or normalization control genesegments of the present invention.

The individual normalization control gene segments can be fragmented bychemical, mechanical or enzymatic methods that are well known in theart. See, e.g., Sambrook et al. (1989). Preferably, normalizationcontrol gene segment RNA is fragmented by magnesium ion-inducedhydrolysis at alkaline pH and elevated temperature. Most preferably, RNAis fragmented in fragmentation buffer (40 mM Tris-acetate (pH 8.1); 100mM potassium acetate; 30 mM magnesium chloride) at 95° C. between 25 and50 minutes.

The hybridized nucleic acids are typically detected by detecting one ormore labels attached to the sample nucleic acids and the normalizationcontrols. The available labels include but are not limited to:radioactive isotopes; fluorescent labels, such as fluoresceinisothiocyanate, Texas red, rhodamine, fluorescein-12-deoxycytosinetriphosphate, lissamine-5-deoxycytosine triphosphate, and the lice;polypeptides that are detectable by antibodies; biotin that isdetectable by labeled avidin; chemiluminescent labels; enzymes;substrates; cofactors; magnetic particles; heavy metal atoms; andspectroscopic labels. The labels may be incorporated by any of a numberof means well known to those of skill in the art. (See e.g., Lockhart etal., (1999) WO 99/32660; U.S. Pat. No. 3,817,837; U.S. Pat. No.3,850,752; U.S. Pat. No. 3,939,350; U.S. Pat. No. 3,996,345; U.S. Pat.No. 4,277,437; U.S. Pat. No. 4,275,149; and U.S. Pat. No. 4,366,241).

The labels can be incorporated either during synthesis of thenormalization control genes or normalization control gene segments orafter synthesis of the normalization control genes or normalizationcontrol gene segments.

B. Assay or Hybridization Formats.

The present invention may be practiced with any hybridization assayformat, including solution-based and solid support-based assay formats.As used herein, “hybridization assay format(s)” refer to theorganization of the oligonucleotide probes relative to the nucleic acidsample. The hybridization assay formats of the present invention, forexample, include assays where the nucleic acid sample is labeled withone or more detectable labels, assays where the probes are labeled withone or more detectable labels, and assays where the sample or the probesare immobilized. Hybridization assay formats include but are not limitedto: Northern blots, Southern blots, dot blots, solution-based assays,branched-DNA assays, microarrays and biochips.

As used herein a “probe” or “oligonucleotide probe” is defined as anucleic acid, capable of binding to a nucleic acid sample ornormalization control gene segment of complementary sequence through oneor more types of chemical bonds, usually through complementary basepairing, usually through hydrogen bond formation. As used herein, aprobe may include natural (i.e., A, G, U, C or T) or modified bases(7-deazaguanosine, inosine, etc.). In addition, the bases in probes maybe joined by a linkage other than a phosphodiester bond, so long as itdoes not interfere with hybridization. Thus, probes may be peptidenucleic acids in which the constituent bases are joined by peptide bondsrather than phosphodiester linkages. The oligonucleotide probescomprising the oligonucleotide arrays can be obtained from any source. Apreferred source is animal nucleic acids, and a more preferred source ishuman nucleic acids. Plant nucleic acids, and microbial nucleic acids,specifically including bacterial and fungal nucleic acids, are alsopreferred sources of oligonucleotide probes. In another embodiment ofthe invention tissue specific nucleic acids and disease-specific nucleicacids are the preferred sources of oligonucleotide probes.

Any solid surface to which oligonucleotides or nucleic acid sample canbe bound, either directly or indirectly, either covalently ornon-covalently, can be used. For example, solid supports for varioushybridization assay formats can be filters, polyvinyl chloride dishes,silicon or glass based chips, etc. Glass-based solid supports, forexample, are widely available, as well as associated hybridizationprotocols. (See, e.g., Beattie, WO 95/11755).

A preferred solid support is a high density array or DNA chip. Thiscontains an oligonucleotide probe of a particular nucleotide sequence ata particular location on the array. Each particular location may containmore than one molecule of the probe, but each molecule within theparticular location has an identical sequence. Such particular locationsare termed features. There may be, for example, 2, 10, 100, 1000 to10,000; 100,000 or 400,000 such features on a single solid support. Thesolid support, or more specifically, the area wherein the probes areattached, may be on the order of a square centimeter.

1. Dot Blots.

The normalization controls and methods of the present invention may beutilized in numerous hybridization formats such as dot blots, dipstick,branched DNA sandwich and ELISA assays. Dot blot hybridization assaysprovide a convenient and efficient method of rapidly analyzing nucleicacid samples in a sensitive manner. Dot blots are generally as sensitiveas enzyme-linked immunoassays. Dot blot hybridization analyses are wellknown in the art and detailed methods of conducting and optimizing theseassays are detailed in U.S. Pat. Nos. 6,130,042 and 6,129,828, andTkatchenko et al. (2000) Biochiminca et Biophysica Acta 1500: 17-30.Specifically, labeled or unlabeled nucleic acid sample is denatured andbound to a membrane (i.e. nitrocellulose), and is then contacted withunlabeled or labeled oligonucleotide probes. Buffer and temperatureconditions can be adjusted to vary the degree of identity between theoligonucleotide probes and nucleic acid sample necessary forhybridization.

Several modifications of the basic Dot blot hybridization format havebeen devised. For example, Reverse Dot blot analyses employ the samestrategy as the Dot blot method, except that the oligonucleotide probesare bound to the membrane and the nucleic acid sample is applied andhybridized to the bound probes. Similarly, the Dot blot hybridizationformat can be modified to include formats where either the nucleic acidsample or the oligonucleotide probe is applied to microtiter plates,micorbeads or other solid substrates. Each of these variations on thebasic Dot blot hybridization format may be used to detect and analyzeany nucleic acid sample, including allelic variation betweenindividuals, detection of single nucleotide polymorphisms (SNPs),genotyping and genetic mapping, gene expression and differential geneexpression between normal and diseased (i.e. pathological or metastatic)tissues or cells.

2. Membrane-Based Formats.

Although each membrane-based format is essentially a variation of theDot blot hybridization format, several types of these formats arepreferred. Specifically, the methods of the present invention may beused in Northern and Southern blot hybridization assays. Although themethods of the present invention are generally used in quantitativenucleic acid hybridization assays, these methods may be used inqualitative or semi-quantitative assays such as Southern blots, in orderto facilitate comparison of blots. Southern blot hybridization, forexample, involves cleavage of either genomic or cDNA with restrictionendonucleases followed by separation of the resultant fragments on apolyacrylamide or agarose gel and transfer of the nucleic acid fragmentsto a membrane filter. Labeled oligonucleotide probes are then hybridizedto the membrane-bound nucleic acid fragments. In addition, intact cDNAmolecules may also be used, separated by electrophoresis, transferred toa membrane and analyzed by hybridization to labeled probes. Northernanalyses, similarly, are conducted on nucleic acids, either intact orfragmented, that are bound to a membrane. The nucleic acids in Northernanalyses, however, are generally RNA.

3. Arrays.

High-throughput analysis of genetic sequences has been accomplished bythe development of oligonucleotide, and micro-array technology.Oligonucleotide probe arrays can be made and used according to anytechniques known in the art (see for example, Lockhart et al., (1996)Nat. Biotechnol. 14, 1675-1680; McGall et al., (1996) Proc. Nat. Acad.Sci. USA 93, 13555-13460). Array formats may be used to detect andanalyze allelic variation between individuals, detection of singlenucleotide polymorphisms (SNPs), genotyping and genetic mapping, geneexpression and differential gene expression between normal and diseased(i.e. pathological or metastatic) tissues or cells. Such probe arraysmay contain at least two or more oligonucleotides that are complementaryto or hybridize to one or more of the nucleic acids of the nucleic acidsample and/or the normalization control genes or normalization controlgene segments. Such arrays may also contain oligonucleotides that arecomplementary or hybridize to at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 20,30, 50, 70 or more of the nucleic acids of the nucleic acid sample.

Oligonucleotide probes for assaying the tissue or cell sample arepreferably of sufficient length to specifically hybridize only toappropriate, complementary genes or transcripts. Typically theoligonucleotide probes will be at least 10, 12, 14, 16, 18, 20 or 25nucleotides in length. In some cases longer probes of at least 30, 40,or 50 nucleotides will be desirable. The oligonucleotide probes of highdensity array chips include oligonucleotides that range from about 5 toabout 45 or 5 to about 500 nucleotides, more preferably from about 10 toabout 40 nucleotides and most preferably from about 15 to about 40nucleotides in length. In other particularly preferred embodiments theprobes are 20 or 25 nucleotides in length. In another preferredembodiment, probes are double or single strand DNA sequences. DNAsequences are isolated or cloned from natural sources or amplified fromnatural sources using natural nucleic acid as templates. These probeshave sequences complementary to particular subsequences of the nucleicacid sample and/or normalization control gene segments. Thus, theoligonucleotide probes are capable of specifically hybridizing to thenucleic acid sample and/or the normalization control gene segments.

One of skill in the art will appreciate that an enormous number of arraydesigns are suitable for the practice of this invention. The highdensity array will typically include a number of probes thatspecifically hybridize to the sequences of interest. (See WO 99/32660for methods of producing probes for a given gene or genes.) Assays andmethods of the invention may utilize available formats to simultaneouslyscreen at least about 100, preferably about 1000, more preferably about10,000 and most preferably about 1,000,000 different nucleic acidhybridizations.

The methods of this invention are also applicable to commerciallyavailable oligonucleotide arrays. A preferred oligonucleotide array maybe selected from the Affymetrix, Inc. GeneChip® series of arrays whichinclude the GeneChip® Human Genome U95 Set, GeneChip® Hu35K Set,GeneChip®, HuGeneFL Array, GeneChip® Human Cancer G 110 Array, GeneChip®Rat Genome U34 Set, GeneChip® Mu19K Set, GeneChip® Mu11K Set, GeneChip®Yeast Genome S98 Array, GeneChip® E. coli Genome Array, GeneChip®Arabidopsis Genome Array, GeneChip® HuSNP™ Probe Array, GeneChip®GenFleX™ Tag Array, GeneChip® HIV PRT Plus Probe Array, GeneChip® P53Probe Array, GeneChip®, and the CYP450 Probe Array. In anotherembodiment, an oligonucleotide array may be selected from the IncytePharmaceuticals, Inc. GEM™ series of arrays which includes the UniGEM™ V2.0, Human Genome GEM 1, Human Genome GEM 2, Human Genome GEM 3, HumanGenome GEM 4, Human Genome GEM 5, LifeGEM™ 1 Cancer/Signal Peptide,LifeGEM 2 Inflammation/Blood, Mouse GEM 1 Rat GEM 1 Liver/Kidney, RatGEM 2 Central Nervous System, Rat GEM 3 Liver/Kidney, S. aureus GEM 1,C. albicans GEM 1, and Arabidopsis GEM 1.

Methods of data collection, image processing and data processing arewell-known in the art. Hegde et al. (2000) Biotechniques 29 (3):548-562; Winzeller et al. (1999) Meth. Enzymol 306 (1): 3-18; Tkatchenkoet al. (2000) Biochimica et Biophysica Acta 1500: 17-30; Berger et al.(2000) WO 00/04188; Schuchhardt et al. (2000) Nucleic Acids Research 28(10): e47; Eickhoff et al. (1999) Nucleic Acids Research 27 (22): e33.Micro-array data analysis and image processing software packages andprotocols are available from BioDiscovery (http://www.biodiscovery.com/)Silicon Graphics (http://www.sigenetics.com) Spotfire(http://www.spotfire.com/), Stanford University(http://rana.Stanford.EDU/software/), National Human Genome ResearchInstitute (http://www.nhgri.nih.gov/DIR/LCG/15K/HTML/img_analysis.html)and TIGR (http://www.tigr.org/softlab/). Micro-arrays can be scannedusing numerous commercially available detectors and scanners, such asthe ScanArray® 3000 (GSI Lumonics, Watertown, Mass., USA), for example.

C. Hybridization.

As used herein, “nucleic acid hybridization” simply involves contactinga probe and nucleic acid sample under conditions where the probe and itscomplementary target can form stable hybrid duplexes throughcomplementary base pairing (see Lockhart et al., (1999) WO 99/32660).The nucleic acids that do not form hybrid duplexes are then washed awayleaving the hybridized nucleic acids to be detected, typically throughdetection of an attached detectable label.

It is generally recognized that nucleic acids are denatured byincreasing the temperature or decreasing the salt concentration of thebuffer containing the nucleic acids. Under low stringency conditions(e.g., low temperature and/or high salt) hybrid duplexes (e.g., DNA-DNA,RNA-RNA or RNA-DNA) will form even where the annealed sequences are notperfectly complementary. Thus, specificity of hybridization is reducedat lower stringency. Conversely, at higher stringency (e.g., highertemperature or lower salt) successful hybridization requires fewermismatches. One of skill in the art will appreciate that hybridizationconditions may be selected to provide any degree of stringency. In apreferred embodiment, hybridization is performed at low stringency, inthis case in 6×SSPE-T at 37° C. (0.005% Triton x-100) to ensurehybridization and then subsequent washes are performed at higherstringency (e.g., 1×SSPE-T at 37° C.) to eliminate mismatched hybridduplexes. Successive washes may be performed at increasingly higherstringency (e.g., down to as low as 0.25×SSPE-T at 37° C. to 50° C.until a desired level of hybridization specificity is obtained.Stringency can also be increased by addition of agents such asformamide. Hybridization specificity may be evaluated by comparison ofhybridization to the test probes with hybridization to the variouscontrols that can be present (e.g., expression level control,normalization control, mismatch controls, etc.).

As used herein, the term “stringent conditions” refers to conditionsunder which a probe will hybridize to a complementary nucleic acidsample or normalization control gene segment, but with onlyinsubstantial hybridization to other sequences. Stringent conditions aresequence-dependent and will be different under different circumstances.Longer sequences hybridize specifically at higher temperatures.Generally, stringent conditions are selected to be about 5° C. lowerthan the thermal melting point (T_(m)) for the specific sequence at adefined ionic strength and pH.

Typically, stringent conditions will be those in which the saltconcentration is at least about 0.01 to 1.0 M sodium ion concentration(or other salts) at pH 7.0 to 8.3 and the temperature is at least about30° C. for short probes (e.g., 10 to 50 nucleotide). Stringentconditions may also be achieved with the addition of destabilizingagents such as formamide,

In general, there is a tradeoff between hybridization specificity(stringency) and signal intensity. Thus, in a preferred embodiment, thewash is performed at the highest stringency that produces consistentresults and that provides a signal intensity greater than approximately10% of the background intensity. Thus, in a preferred embodiment, thehybridized array may be washed at successively higher stringencysolutions and read between each wash. Analysis of the data sets thusproduced will reveal a wash stringency above that the hybridizationpattern is not appreciably altered and which provides adequate signalfor the particular oligonucleotide probes of interest.

The “percentage of sequence identity” or “sequence identity”isdetermined by comparing two optimally aligned sequences or subsequencesover a comparison window or span, wherein the portion of thepolynucleotide sequence in the comparison window may optionally compriseadditions or deletions (i.e., gaps) as compared to the referencesequence (which does not comprise additions or deletions) for optimalalignment of the two sequences. The percentage is calculated bydetermining the number of positions at which the identical residue(e.g., nucleic acid base or amino acid residue) occurs in both sequencesto yield the number of matched positions, dividing the number of matchedpositions by the total number of positions in the window of comparisonand multiplying the result by 100 to yield the percentage of sequenceidentity. Percentage sequence identity when calculated using theprograms GAP or BESTFIT (see below) is calculated using default gapweights.

Homology or identity is determined by BLAST (Basic Local AlignmentSearch Tool) analysis using the algorithm employed by the programsblastp, blastn, blastx, tblastn and tblastx (Karlin et al., (1990) Proc.Natl. Acad. Sci. USA 87, 2264-2268 and Altschul, (1993) J. Mol. Evol.36, 290-300, fully incorporated by reference) which are tailored forsequence similarity searching. The approach used by the BLAST program isto first consider similar segments between a query sequence and adatabase sequence, then to evaluate the statistical significance of allmatches that are identified and finally to summarize only those matcheswhich satisfy a preselected threshold of significance. For a discussionof basic issues in similarity searching of sequence databases, seeAltschul et al., (1994) Nature Genet. 6, 119-129) which is fullyincorporated by reference. The search parameters for histogram,descriptions, alignments, expect (i.e., the statistical significancethreshold for reporting matches against database sequences), cutoff,matrix and filter are at the default settings. The default scoringmatrix used by blastp, blastx, tblastn, and tblastx is the BLOSUM62matrix (Henikoff et al., (1992) Proc. Natl. Acad. Sci. USA 89,10915-10919, fully incorporated by reference). Four blastn parameterswere adjusted as follows: Q=10 (gap creation penalty); R=110 (gapextension penalty); wink=1 (generates word hits at every wink^(th)position along the query); and gapw=16 (sets the window width withinwhich gapped alignments are generated). The equivalent Blastp parametersettings were Q=9; R=2; wink=1; and gapw=32. A Bestfit comparisonbetween sequences, available in the GCG package version 10.0, uses DNAparameters GAP=50 (gap creation penalty) and LEN=3 (gap extensionpenalty) and the equivalent settings in protein comparisons are GAP=8and LEN=2.

D. Preparation of Nucleic Acid Samples.

As used herein, “nucleic acid sample” refers to any nucleic acid orpooled nucleic acid isolated from any source. A preferred nucleic acidsample contains genomic DNA or cDNA. A more preferred embodimentcontains mRNA, cRNA, or polyA-RNA. The nucleic acid sample may be clonedor not and the nucleic acid may be amplified or not. The cloning itselfdoes not appear to bias the representation of genes within a population.However, it may be preferable to use polyA-RNA as a source, as it can beused with less processing steps. As used herein, “nucleic acid sample”also refers to any nucleic acid of any origin that is applied to apartially or fully complementary nucleic acid(s), oligonucleotide(s), oroligonucleotide probe(s) in a hybridization reaction.

As is apparent to one of ordinary skill in the art, nucleic acid samplesused in the methods of the present invention may be prepared by anyavailable method or process. Methods of isolating total mRNA are alsowell known to those of skill in the art. For example, methods ofisolation and purification of nucleic acids are described in detail inChapter 3 of Laboratory Techniques in Biochemistry and MolecularBiology: Hybridization With Nucleic Acid Probes. Part I Theory andNucleic Acid Preparation, Tijssen, (1993) (editor) Elsevier Press. Suchsamples include RNA samples, but also include cDNA synthesized from amRNA sample isolated from a cell or tissue of interest. Such samplesalso include DNA amplified from cDNA or genomic DNA, and RNA produced byin vitro transcription of the amplified DNA (cRNA). One of slcill in theart would appreciate that it is desirable to inhibit or destroy RNasepresent in homogenates before homogenates can be used.

As used herein, “biological samples” refer to any biological tissue orfluid or cells from any organism as well as cells raised in vitro, suchas cell lines and tissue culture cells. Frequently, the sample will be a“clinical sample” which is a sample derived from a patient. Typicalclinical samples include, but are not limited to, sputum, blood,blood-cells (e.g., white cells), serum, plasma, spinal fluid, semen,lymph, tissue or fine needle biopsy samples, tumors, organs, urine,peritoneal fluid, and pleural fluid, or cells therefrom. Biologicalsamples may also include sections of tissues, such as frozen sections orformalin fixed sections taken for histological purposes.

Tissue samples, following homogenization, and isolated cells are lysedby conventional methods that disrupt the cells and inactivateribonucleases (RNase) present in the sample. For example, RNase iscommonly inactivated by the addition of 4M guanidinium thiocyanate andβ-mercaptoethanol. Inactivation of RNase by such solutions allow forisolation of intact RNA from cells and tissue samples. See e.g.,Sambrook et al. (1989); Perbal (1984); and Scopes (1991).

Total RNA may be extracted by any conventional method known in the art.Total RNA may be extracted, for example, using methods comprisingguanidinium hydrochloride and cesium chloride (Glisin et al. (1974)Biochemistry 13: 2633; Ullrich et al. (1977) Science 196: 1313;Chomczynski et al. (1987) Anal. Biochem. 162: 156) or by methodscomprising guanidinium hydrochloride and organic solvents (Strohman etal. (1977) Cell 10: 265; McDonald et al. (1987) Meth. Enzymol. 152:219). Alternatively, RNA extraction kits are commercially available. Forexample, RNA STAT 60® (Tel-Test, Inc., Friendswood, Tex.), RNeasy®(QIAGEN), Tripure® (Boehringer Mannheim Biochemicals, Indianapolis,Ind.), Trizol (GIBCO Laboratories, Gaithersburg, Md.), and Tri Reagent®(Molecular Research Center, Inc., Cincinnati, Ohio).

The normalization controls of the present invention can be added to thenucleic acid sample from concentrated stock solutions to bring thenormalization control to the desired concentration. Preferably, thenormalization controls are added to the nucleic acid sample from a 2×stock solution; more preferably from a 10× stock solution; even morepreferably from a 20× stock solution; and most preferably from a 100×stock solution.

Without further description, it is believed that one of ordinary skillin the art can, using the preceding description and the followingillustrative examples, practice the methods of the present invention.The following working examples therefore, specifically point out thepreferred embodiments of the present invention, and are not to beconstrued as limiting in any way the remainder of the disclosure.

EXAMPLES Example 1 Preparation of Normalization Control Gene Segments

Clones containing the normalization control genes BioB, BioC, BioD andCre were obtained from the American Type Culture Collection (ATCC),Manassas, Va. Specifically, pglks-bioB (ATCC 87487), pglks-bioC (ATCC87488), pglks-bioD (ATCC 87489), pglks-cre (ATCC 87490), and pglbs-dap(ATCC 87486) were used to transform Escherichia coli. The transformedbacterial cells were cultured (50 ml) according to establishedprotocols. The plasmid DNA from the overnight cultures were isolatedusing QIAGEN® Plasmid Kits and other standard procedures.

Fragments of the normalization control genes, namely normalizationcontrol gene segments, were produced and inserted into pBluescript II.The normalization control gene segments were amplified in E. coli,isolated and sequenced. The size and identity of the normalizationcontrol gene segments are summarized in Table 1. TABLE 1 Control InsertSize Gene Name (bp) Organism Product BioB 3′ 350 E. coil biotin BioB 5′350 synthetase BioB M 380 BioC 3′ 360 E. coli biotin BioC 5′ 414synthesis protein BioD 3′ 400 E. coli dethiobiotin synthetase Cre 3′ 503P1 phage site-specific Cre 5′ 560 recombinase Dap 3′ 667 B. subtilisdehydrodipic Dap 5′ 720 olinate Dap M 665 reductase

10 μg of isolated plasmid DNA containing the normalization control genesegment DapM, Dap5′, Cre5′, BioB3′, BioBM, BioD3′, BioC5′ or Dap3 weredigested in a 50 μl reaction volume with XhoI, according to themanufacturer's protocols. BioC3′ and Cre3′ were both linearized withKpnI, which produces a 3′ overhang and thereby prevents the in vitrotranscription reaction from continuously producing cRNA of the insertand plasmid. Controls were blunt-ended using T4 polymerase and examinedfor complete digestion on an E-Gel™ (Invitrogen, Calif.). Digestion ofthe isolated plasmid was monitored by gel electrophoresis in a 1%agarose gel, using 50 ng each of the uncut and linearized plasmid.Following complete digestion of the isolated plasmid with either XhoI orKpnI, the linearized plasmid DNA was phenol/chloroform/isoamyl alcoholextracted and precipitated with ethanol, according to'establishedprotocols. The linearized plasmid DNA was resuspended in 10 μl ofDEPC-treated water and quantified by UV spectrophotometry at 260 nm. Thefinal concentration of the purified DNA (OD 260/280 nm=1.8-2.0) wasadjusted to 0.5 μg/μl.

In vitro transcription reactions were performed using 1-2 μg of anormalization control gene segment at 37° C. for 6 hours and Ambion's T7MegaScript in vitro Transcription Kit. After completion of the reaction,the residual DNA was digested using 1 μl DNase. The cRNA produced waspurified by using an RNeasy® Mini Kit (Qiagen, Calif.).

The cRNA was then fragmented (5× fragmentation buffer: 200 mMTris-Acetate (pH 8.1), 500 mM KOAc, 150 mM MgOAc) for thirty-fiveminutes at 95° C. The appropriate fragmentation time was determined foreach control by subjecting the normalization control gene segment cRNAto fragmentation of varying duration. For example, controls werefragmented between 25 and 50 minutes at 95° C. When the gene segmentsfragmented at 25, 29, 31, 33, 37, 39, 41, 43, 45 and 56 minutes were runin a PAGE gel, smear decreased with time and decreased most dramaticallyfor the samples fragmented at 33 and 35 minutes. The Average Differencevalues on the GeneChip™ array platform also decreased with fragmentationtime and decreased most dramatically after 33 minutes of fragmentation.

Bio-11-CTP and Bio-16-UTP nucleotides (Enzo Diagnostics) were added tothe reaction to biotinylate the cRNA. After a 37° C. incubation for sixhours, the labeled cRNA was cleaned up according to the RNeasy Mini kitprotocol (QIAGEN).

Example 2 Nucleic Acid Sample Acquisition and Preparation.

With minor modifications, the nucleic acid sample preparation protocolfollowed the Affymetrix GeneChip® Expression Analysis Manual. Frozentissue was first ground to powder using the Spex Certiprep 6800 FreezerMill. Total RNA was then extracted using Trizol (Life Technologies). Thetotal RNA yield for each sample (average tissue weight of 300 mg) wasabout 200-500 μg. Next, mRNA was isolated using the Oligotex mRNA MiniKit (QIAGEN). Since the mRNA was eluted in a final volume of 400 μl, anethanol precipitation step was required to bring the concentration to 1μg/μl. Using 1-5 μg of mRNA, double stranded cDNA was created using theSuperScript Choice system (Gibco-BRL). First strand cDNA synthesis wasprimed with a T7-(dT₂₄) oligonucleotide. The cDNA was thenphenol-chloroform extracted and ethanol precipitated to a finalconcentration of 1 μg/μl.

55 μg of fragmented cRNA was hybridized on the human 32K set and theHuGeneFL array for twenty-four hours at 60 rpm in a 45° C. hybridizationoven, according to the Affymetrix protocol. The chips were washed andstained with Streptavidin Phycoerythrin (SAPE) (Molecular Probes) inAffymetrix fluidics stations. To amplify staining, the chips were washedwith SAPE solution, stained with an anti-streptavidin biotinylatedantibody (Vector Laboratories) followed by washing with SAPE solution.Hybridization to the probe arrays was detected by fluorometric scanning(Hewlett Packard Gene Array Scanner). Following hybridization andscanning, the microarray images were analyzed for quality control,looking for major chip defects or abnormalities in hybridization signal.After all chips passed quality control, the data was analyzed usingAffymetrix GeneChip® software (v3.0), and Experimental Data Mining Tool(EDMT) software (v1.0).

Example 3 Cross-Hybridization Analysis of Normalization Controls.

Following fragmentation of the normalization control gene segments, thecRNA were dissolved in MES buffer (101.6 mM MES; 1M NaCl; 0.01% Tween20; 0.1 mg/ml herring sperm DNA) and the precise concentration for eachcontrol was determined. Three dilutions for each control were analyzed—1:200, 1:100 and 1:50. Table 2 shows the three calculatedconcentrations, the average concentration, the standard deviation(StDev) and the relative standard deviation (RSD). In order to generateconsistent normalization control batches, only those controls with RSDless than 6.5% were selected. TABLE 2 1:200 Avg Control μg/ml 1:100μg/ml 1:50 μg/ml μg/ml StDev RSD BioB-5′ 880 840 840 853.3 23.1 2.71Dap-M 480 440 460 460.0 20.0 4.35 Dap-5′ 800 840 900 846.7 50.3 5.94Cre-5′ 1040 1040 1160 1080.0 69.3 6.42 BioB-3′ 1040 960 1080 1026.7 61.15.95 BioB-M 1040 1080 1100 1073.3 30.6 2.85 BioD-3′ 1040 1040 11601080.0 69.3 6.42 BioC-5′ 1520 1440 1520 1493.3 46.2 3.09 BioC-3′ 640 640600 626.7 23.1 3.69 Dap-3′ 640 680 700 673.3 30.6 4.54 Cre-3′ 782 800810 797.3 14.2 1.78

Fragmented nucleic acid sample alone, or nucleic acid sample mixed withnormalization control gene segments was hybridized on the human 32K setand the HuGeneFL array for twenty-four hours at 60 rpm in a 45° C.hybridization oven. The chips were washed and stained with SAPE Solutionin Affymetrix fluidics stations. Hybridization to the probe arrays wasdetected by fluorometric scanning (Hewlett Packard Gene Array Scanner).The cross-hybridization of the candidate normalization control genesegments was analyzed by comparing the binding of the normalizationcontrol gene segments in the presence and absence of nucleic acidsample. For example, each normalization control gene segment cRNA washybridized to a GeneChip® array, in the absence of nucleic acid sample,to confirm that each segment hybridizes to the correct tile or the chip.In addition, nucleic acid sample in the absence of the normalizationcontrol cRNA was also hybridized under identical conditions to theGeneChip® array to confirm the absence of cross-hybridization to thenormalization control probes on the array.

Example 4 Selection of Normalization Controls.

In order to determine the optimal concentration for use with eachindividual normalization control gene segment, nucleic acid samples weremixed with one concentration for each normalization control gene segmentand hybridized to the chip according to the procedures described above.The task of assigning each normalization control gene segment to aspecific concentration was complicated by an initial inconsistentperformance of the controls. (FIG. 1). To determine the linearperformance of each normalization control gene segment and identify theoptimal concentration for each control, we hybridized each normalizationcontrol gene segment at concentrations ranging from 0.5 to 100 pM. (See,Table 3). Each chip has each control at a different concentration, andthe 12 chips assure that each control is measured at the desiredconcentration. Each normalization control gene segment was analyzed overa range of concentrations that are summarized in Table 3. The medianintensity of the normalization control gene segments bound at eachconcentration was plotted so that the normalization control genesegments that hybridize similarly to the array and produce the mostconsistently linear curve of hybridization signal over a range ofconcentrations were selected. TABLE 3 Control Name chip 1 chip 2 chip 3chip 4 chip 5 chip 6 chip 7 chip 8 chip 9 chip 10 chip 11 chip 12 BioB5′ 0.5 0.75 1 1.5 2 3 5 12.5 25 50 75 100 Dap M 0.75 1 1.5 2 3 5 12.5 2550 75 100 0.5 Dap 5′ 1 1.5 2 3 5 12.5 25 50 75 100 0.5 0.75 Cre 5′ 1.5 23 5 12.5 25 50 75 100 0.5 0.75 1 BioB 3′ 2 3 5 12.5 25 50 75 100 0.50.75 1 1.5 BioB M 3 5 12.5 25 50 75 100 0.5 0.75 1 1.5 2 BioD 3′ 5 12.525 50 75 100 0.5 0.75 1 1.5 2 3 BioC 5′ 12.5 25 50 75 100 0.5 0.75 1 1.52 3 5 BioC 3′ 25 50 75 100 0.5 0.75 1 1.5 2 3 5 12.5 Dap 3′ 50 75 1000.5 0.75 1 1.5 2 3 5 12.5 25 Cre 3′ 75 100 0.5 0.75 1 1.5 2 3 5 12.5 2550

A cocktail of normalization control gene segments at differentconcentrations is selected based on the linear performance of eachcocktail as determined by the linear coefficient (R²) of each cocktail.Specifically, normalization control cocktails, such as those illustratedin Table 3, that display the highest linear performance, based onidentifying those cocktails that have the highest average R² and thehighest minimum R² values, were selected as normalization controls. Inorder to minimize the computation time necessary to evaluate allpossible normalization control cocktails, the normalization control genesegments BioC5′, Dap3′, and Cre3′ were preassigned to 0.5, 1.0 and 3.0pM, respectively, based on the linear performance of the individualcontrols. Furthermore, based on a similar analysis, BioC5′, Dap3′, Dap5′and DapM each performed best at concentration assignments equal to orbelow 2 pM, whereas BioB3′ and BioD3′ performed best at either 75 or 100pM. Three normalization control cocktails were prepared (See, Table 4)and used on various GeneChip arrays. Specifically, cocktails 831, 849,and 7211 were each tested on HG-U95A arrays (5 different tissue typesperformed in triplicate), rat RG-U34 arrays (2 different tissue typesperformed in triplicate), Arabidopsis array (one tissue performed intriplicate) and the yeast YG-S98 array (performed in triplicate). Theindividual standard curves produced for each cocktail on these arraysare presented in FIG. 2. TABLE 4 Control cocktail cocktail cocktail Name831 849 7211 BioB 5′ 25 50 12.5 Dap M 2 2 2 Dap 5′ 1.5 1.5 1 Cre 5′ 12.512.5 25 BioB 3′ 100 75 100 BioB M 50 25 50 BioD 3′ 75 100 75 BioC 5′ 1 11.5 BioC 3′ 3 5 5 Dap 3′ 0.5 0.5 0.5 Cre 3′ 5 3 3

The GeneChip array experiments with normalization control cocktails 831,849, and 7211 indicate that cocktail 7211 exhibits the highest R². Onlyat concentrations greater than 50 pM did this normalization controlcocktail display nonlinearity, in contrast to cocktail 849. Therefore,BioB3′ was assigned to 75 pM and BioD3′ to 100 pM, respectively, tofurther improve the linear performance of cocktail 7211. The standardcurve based on the improved 7211 cocktail (R^(2=0.985)) is shown in FIG.3.

Although the present invention has been described in detail withreference to examples above, it is understood that various modificationscan be made without departing from the spirit of the invention.Accordingly, the invention is limited only by the following claims. Allcited patents and patent applications and publications referred to inthis application are herein incorporated by reference in their entirety.

1. A method of normalizing a hybridization reaction comprising a nucleicacid sample, comprising: a) adding at least one normalization controlgene segment to the hybridization reaction corresponding to the 5′,middle or 3′ regions of at least one normalization control gene.
 2. Amethod of claim 1, wherein the normalization control gene segment is notpresent in the nucleic acid sample.
 3. A method of claim 2, wherein thenormalization control genes are selected from the group consisting of:a) viral genes; b) prokaryotic genes; and c) eukaryotic genes.
 4. Amethod of claim 2, wherein hybridization reaction is conducted on asolid substrate.
 5. A method of claim 4, wherein the solid substrate isan oligonucleotide array.
 6. A method of claim 5, wherein the arraycomprises oligonucleotide probes that are complementary to thenormalization control gene segments.
 7. A method of claim 6, wherein theoligonucleotide probes of the array are selected from the groupconsisting of: a) human nucleic acids; b) non-human nucleic acids; c)animal nucleic acids; d) microbial nucleic acids; e) bacterial nucleicacids; f) fungal nucleic acids; g) tissue specific nucleic acids; h)disease specific nucleic acids; and i) plant nucleic acids.
 8. Themethod of claim 7, wherein the normalization control gene segments areselected by a method comprising determining the non-specificcross-hybridization of the nucleic acid sample to the normalizationcontrol gene segments.
 9. The method of claim 8, wherein thenormalization control gene segments that do not substantiallycross-hybridize are selected.
 10. The method of claim 7, wherein thenormalization control gene segments that are added to the hybridizationreaction are selected by a method comprising analyzing a series ofhybridization reactions wherein each hybridization reaction of theseries contains an increased concentration of the normalization controlgene segment.
 11. The method of claim 10, wherein the normalizationcontrol gene segments that produce the most consistently linear curve ofhybridization signal over a range of normalization control gene segmentconcentrations are selected.
 12. The method of claim 1, wherein thenormalization control gene segments are the 5′, middle and 3′ fragmentsof at least one normalization control gene.
 13. A method of normalizinga hybridization reaction comprising a nucleic acid sample, comprising:a) providing a normalization control comprising one or morenormalization control gene segments, wherein said normalization controlgene segments are mixed with the nucleic acid sample, and wherein saidnormalization control gene segments are prepared by a method comprising:i) selecting one or more candidate normalization control genes; ii)segmenting the candidate normalization control genes into 5′-, middle-,and 3′-segments, thereby producing candidate normalization control genesegments; iii) hybridizing said candidate normalization control genesegments to an oligonucleotide probe in the presence and absence of thenucleic acid sample; iv) determining the non-specificcross-hybridization of candidate normalization control gene segments tosaid oligonucleotide probe by determining the hybridization of candidatenormalization control gene segments to probes other than thosecomplementary to the candidate normalization control gene segments; v)repeating step (iii) at various concentrations of candidatenormalization control gene segments; and vi) identifying and selectingthose candidate normalization control gene segments that do notsubstantially cross-hybridize to said oligonucleotide probe.
 14. Themethod of claim 13, wherein the normalization control gene segments areprepared by method further comprising the following steps: a) preparingindividual mixtures of nucleic acid samples and candidate normalizationcontrol gene segments wherein each individual mixture contains adifferent concentration of the candidate normalization control genesegments identified in step (vi); b) hybridizing a mixture of step (a)to an oligonucleotide probe; c) repeating step (b) with mixturescontaining different concentrations of candidate normalization controlgene segments; d) identifying the candidate normalization control genesegments that produce the most consistently linear hybridizationresponse over a range of candidate normalization control gene segmentconcentrations by measuring the hybridization of said candidatenormalization control gene segments to oligonucleotide probes that arecomplementary to the normalization control gene segments over a range ofcandidate normalization control gene segment concentrations; and e)producing a solution containing one or more of the candidatenormalization control gene segments of step (d) over a concentrationrange sufficient to produce a linear normalization curve.
 15. The methodof claim 14, further comprising the steps of: a) hybridizing a mixtureof said nucleic acid sample and the solution of step (e) to said array;and b) quantifying the hybridization of said target or pool of nucleicacid sample to said array.
 16. A method of claim 13, wherein thenormalization control gene segment is not present in the nucleic acidsample.
 17. A method of claim 16, wherein the normalization controlgenes are selected from the group consisting of: a) viral genes; b)prokaryotic genes; and c) eukaryotic genes.
 18. A method of claim 16,wherein hybridization reaction is conducted on a solid substrate.
 19. Amethod of claim 18, wherein the solid substrate is an oligonucleotidearray.
 20. A method of claim 19, wherein the nucleotide array comprisesoligonucleotide probes that are complementary to the normalizationcontrol gene segments.
 21. A method of claim 20, wherein theoligonucleotide probes of the oligonucleotide array are selected fromthe group consisting of: a) human nucleic acids; b) non-human nucleicacids; c) animal nucleic acids; d) microbial nucleic acids; e) bacterialnucleic acids; f) fungal nucleic acids; g) tissue specific nucleicacids; h) disease specific nucleic acids; and i) plant nucleic acids.22. The method of claim 1, wherein the normalization control genesegments are labeled.
 23. The method of claim 22, wherein the label isselected from one or more of the group consisting of: a) a fluorescentlabel; b) a chemiluminescent label; c) a bioluminescent label; d) aradioactive label; e) colorimetric label; and f) a light scatteringlabel.
 24. The method of claim 1, wherein the normalization control genesegments are produced by the polymerase chain reaction.
 25. The methodof claim 1, wherein the normalization control gene segments are producedby cloning into a vector and expressing said normalization control genesegments in a host cell.
 26. The method of claim 1, wherein thenormalization control gene segments are DNA or RNA.
 27. The method ofclaim 26, wherein the normalization control gene segments are RNA. 28.The method of claim 1, further comprising fragmenting the normalizationcontrol gene segments.
 29. The method of claim 1, wherein thenormalization control genes are selected from one or more of the groupconsisting of: a) an Escherichia coli BioB gene; b) an Escherichia coliBioC gene; c) an Escherichia coli BioD gene; d) a P1 bacteriophage Cregene; e) a Bacillus subtilis dap gene; f) a Bacillus subtilis thr gene;g) a Bacillus subtilis trp gene; h) a Bacillus subtilis phe gene; and i)a Bacillus subtilis lys gene.
 30. The method of claim 1, wherein thenucleic acid sample is selected from the group consisting of: a) poolednucleic acid samples; b) genomic DNA; c) cDNA; d) cRNA; e) mRNA; and f)polyA RNA.
 31. The method of claim 5, wherein the oligonucleotide probearray is immobilized on a solid support selected from the groupconsisting of: a) filters; b) polyvinyl chloride dishes; c) silicon orglass beads; and d) glass wafers.
 32. The method of claim 5, wherein theoligonucleotide probe array is a high density array or nucleic acidchip.
 33. A method of claim 29, wherein the normalization control genesare selected from the group consisting of BioB, Dap, Cre, BioD, andBioC.
 34. A method of claim 29, wherein the normalization control genesconsist of BioB, Dap, Cre, BioD, and BioC.
 35. A method of claim 34,wherein the normalization control gene segments comprise BioB 5′, Dap M,Dap 5′, Cre 5′, BioB 3′, BioB M, BioD 3′, BioC 5′, BioC 3′, Dap3′ andCre 3′.
 36. A method of claim 34, wherein the normalization control genesegments are a cocktail comprising BioB 5′, Dap M, Dap 5′, Cre 5′, BioB3′, BioB M, BioD 3′, BioC 5′, BioC 3′, Dap 3′ and Cre 3′.
 37. A methodof claim 36, wherein the cocktail is cocktail 7211 in FIG.
 4. 38. Amethod of claim 37, wherein the cocktail comprises normalization controlgene fragments BioB 5′ at about 12.5 pM, Dap M at about 2 pM, Dap 5′ atabout 1 pM, Cre 5′ at about 25 pM, BioB 3′ at about 100 pM, BioB M atabout 50 pM, BioD 3′ at about 75 pM, BioC 5′ at about 1.5 pM, BioC 3′ atabout 5 pM, Dap 3′ at about 0.5 pM and Cre 3′ at about 3 pM.